Viral dysbiosis in children with new-onset celiac disease

Viruses are common components of the intestinal microbiome, modulating host bacterial metabolism and interacting with the immune system, with a possible role in the pathogenesis of immune-mediated diseases such as celiac disease (CeD). The objective of this study was to characterize the virome profile in children with new-onset CeD. We used metagenomic analysis of viral DNA in mucosal and fecal samples from children with CeD and controls and performed sequencing using the Nextera XT library preparation kit. Abundance log2 fold changes were calculated using differential expression and linear discriminant effect size. Shannon alpha and Bray–Curtis beta diversity were determined. A total of 40 children with CeD and 39 controls were included. We found viral dysbiosis in both fecal and mucosal samples. Examples of significantly more abundant species in fecal samples of children with CeD included Human polyomavirus 2, Enterobacteria phage mEpX1, and Enterobacteria phage mEpX2; whereas less abundant species included Lactococcus phages ul36 and Streptococcus phage Abc2. In mucosal samples however, no species were significantly associated with CeD. Shannon alpha diversity was not significantly different between CeD and non-CeD groups and Bray–Curtis beta diversity showed no significant separation between CeD and non-CeD samples in either mucosal or stool samples, whereas separation was clear in all samples. We identified significant viral dysbiosis in children with CeD, suggesting a potential role in the pathogenesis of CeD indicating the need for further studies.


Introduction
Viruses are common components of the intestinal microbiome, among which, bacteriophages (viruses that infect bacteria) are the most common [1,2]. Bacteriophages  bacterial metabolism and interact with the immune system, triggering the innate and adaptive immune systems by employing similar mechanisms as those used by bacteria, suggesting a role in chronic inflammatory conditions [3,4]. Animal studies in mice have suggested a possible protective role of noroviruses in inflammatory bowel disease [5,6]. In humans, the association between intestinal viruses and inflammatory bowel disease (IBD) has been reported as well, characterizing viral dysbiosis in both Crohn's disease and ulcerative colitis [7][8][9][10]. Celiac disease (CeD) is an autoimmune enteropathy with pathologic similarity to IBD. In genetically susceptible individuals, exposure to gluten results in inflammatory cascade, leading to chronic intestinal injury [11][12][13]. However, gluten exposure alone does not trigger CeD in all genetically susceptible individuals, suggesting a role of additional factors. In this context, microbiota have been reported as one of the most important environmental factors in CeD. Bacterial dysbiosis in patients with CeD demonstrated decreased abundance of "beneficial bacteria" such as Bifidobacteria, Clostridia, and Lactobacilli and enrichment of potentially pathogenic bacteria such as Escherichia coli and Bacteroides [14][15][16]. Similarly, fungal dysbiosis, especially Saccharomyces and Candida, has been described in children with CeD [17][18][19][20] However, less is known at this time regarding the role of viruses within the CeD population. Major classes of human gut virome include bacteriophages, DNA eukaryotic viruses, and RNA eukaryotic viruses [21]. It has been suggested that infection with RNA and possibly DNA eukaryotic viruses may cause a transient disease resulting in loss of tolerance to gluten and development of CeD in susceptible individuals [22]. Bacteriophages, on the other hand, could trigger CeD directly or by modifying bacterial dysbiosis associated with CeD. In this study we used shotgun metagenomic DNA sequencing with the objective to characterize the profile of bacteriophages and DNA eukaryotic viruses in a cohort of children with newly diagnosed CeD. Accordingly, RNA eukaryotic viruses were not analyzed.

Study population
Children were enrolled from King Khalid University Hospital, King Saud University; King Fahad Medical City Children Hospital, Ministry of Health, and Al Mofarreh Polyclinics, a private medical institution, all of which are in Riyadh, the Kingdom of Saudi Arabia (KSA). The children attending the Gastroenterology Clinics were evaluated. Inclusion criteria in this study included age below 18 years, normal gluten containing diet, and no antibiotic exposure for at least 6 months. The study was introduced to the parents and children by one of the investigators, including explanation of all items in the informed consent form approved by the IRB. After signing the written consent form by one of the parents, the children were enrolled in the study. After complete investigations, the children with confirmed diagnosis of CeD were designated as cases and those in whom the diagnosis of CeD was excluded were designated as controls. Controls include healthy school children who gave stool samples. In total, there were two groups of children; the first group included those confirmed CeD who provided stool as well as mucosal samples (n = 20) or mucosal samples (n = 20). In this group, the diagnosis of CeD was confirmed according to the European Society of Pediatric Gastroenterology Hepatology and Nutrition guidelines [23]. The second group involved controls who gave stool samples, included a subgroup of 20 healthy schoolchildren who had negative anti-tissue transglutaminase-A (TTG-A) antibodies with normal serum immunoglobulin A levels taken from a larger random sample recruited for a mass screening study [24]. The other subgroup of non-CeD controls included 19 children who provided mucosal samples; these children presented with various symptoms but were TTG-A-negative with normal serum immunoglobulin A levels, normal endoscopy, and histopathology of the duodenal mucosa.

Sample collection, storage, and retrieval
Mucosal samples from 20 children with confirmed untreated CeD and 19 non-CD controls were collected from D2 in cryovials without fixative or stabilizer and transported in ice to the central laboratory. Similarly, fecal samples were collected in cryovials from 20 children with CeD and 20 healthy controls and transported in ice to the same. All samples were initially stored in a freezer at −80˚C; then, at the time of analysis were retrieved and dispatched by express mail in a temperature-controlled container filled with dry ice until delivery to the laboratory for metagenomic analysis (CosmosID, Rockville, MD, USA).

DNA extraction and sequencing
DNA was isolated from mucosa samples using the Zymobiomics miniprep kit (Zymo Research, Irvine, CA, USA) and from stool samples using the DNeasy PowerSoil DNA kit (Qiagen, Hilden, Germany), with each process done according to the manufacturer's instructions. Isolated DNA was quantified by Qubit (Thermo Fisher Scientific, Waltham, MA, USA).
DNA libraries were prepared using the Illumina Nextera XT library preparation kit, according to the manufacturer's protocol. Library quantity and quality were assessed with Qubit and Tapestation (Agilent Technologies, Santa, Clara, CA, USA). Libraries were then sequenced on an HiSeq platform (2 × 150 bp; Illumina, San Diego, CA, USA).

Bioinformatic and statistical analyses
Unassembled sequencing reads were directly analyzed with the CosmosID bioinformatics platform (CosmosID Inc., Rockville, MD, USA) described elsewhere for multi-kingdom microbiome analysis and quantification of each organism's relative abundance [25][26][27][28]. Briefly, the system uses curated genome databases and a high-performance data-mining algorithm that rapidly disambiguates hundreds of millions of metagenomic sequence reads into the discrete microorganisms engendering the particular sequences.
Abundance analysis. DESeq2 differential abundance analysis. Differential abundance analyses were generated starting with the abundance score matrices from the CosmosID taxonomic analysis. Differential abundance values for organisms were calculated using DESeq2 from the Phyloseq package for R (R Foundation for Statistical Computing, Vienna, Austria). For the mucosal and stool samples separately, the log2 fold change and associated P-values for CeD vs non-CeD are displayed [29,30].
Linear discriminant analysis (LDA) of effect size (LEfSe). LEfSe figures were generated using the Galaxy web application, based on relative abundance tables from CosmosID taxonomic analysis, and calculated with a Kruskal-Wallis alpha value of 0.05, a Wilcoxon alpha value of 0.05, and a logarithmic LDA score threshold of 2.0 [31].
Breadth and depth analysis. The breadth and depth of coverage of across the genome for each of the 39 genomes by each of the four cohorts (mucosa celiac, mucosa non-celiac, stool celiac, stool non-celiac). For virus species with significant differences identified, the genomes were downloaded via NCBI RefSeq for mapping. All metagenomic samples were randomly subsampled to 9 million total reads using the reformat.sh script within the bbtools suite.
All subsampled metagenomic files were mapped against each of the 39 references using bowtie2 (parameters: bowtie2 -D 20 -R 3 -N 1 -L 25 -i S,1,0.50 -f -t -x). Number of total bases in the query sample and number of bases covered against the reference sample were collected from the output bam files using samtools. Coverage breadth was calculated as reference bases covered divided by reference length. Coverage depth average was calculated as total query bases mapped divided by reference length. Cohort average values for each reference have been provided.
Diversity analysis. Alpha diversity. Alpha diversity boxplots were calculated from the species-level abundance-score matrices from the CosmosID taxonomic analysis. Shannon alpha diversity metrics were calculated using the Vegan package for R. Also, t-tests were performed between each the CeD and non-CeD groups using the ggsignif package for R. Finally, boxplots with overlaid significance in p-value format were generated using the ggplot2 package for R [32][33][34].
Beta diversity. Beta diversity two-dimensional principal coordinate analyses (PCoAs) were calculated from the species-level relative-abundance matrices from the CosmosID taxonomic analysis. Bray-Curtis diversity was calculated using the Vegan package for R with the vegdist function and PCoA tables were generated using the Vegan package's PCoA function. Plots were visualized using the ggpubr package for R [32,35].
For all analyses in this report, P-value were corrected for false discovery rate and the difference was considered significant when the P-value was < 0.05.

Ethical approval
This study was approved by the Institutional Review Board of the College of Medicine, King Saud University in Riyadh, Kingdom of Saudi Arabia (no. 14/4464/IRB). All parents gave written informed consent and since all children included in this study were below 18 years, one of the parents signed the written consent in their behalf.

The study population
The demographic and clinical characteristics of the study population of 40 children with CeD and 39 controls are presented in Table 1. Males accounted for 28% and 41% of the children with CeD and the control group, respectively, and the median ages at diagnosis were 10.3, 11.3 and 10.6 years for children with CeD, controls fecal and mucosal samples, respectively. The number of asymptomatic children with CeD was 15 of 40 (38%), while the remainder (n = 25) showed various combinations of symptoms including anemia, growth impairment, and abdominal pain. Stool samples were given by 20 healthy controls, whereas all 19 mucosal samples were provided by children with various symptoms, with functional abdominal pain being the most common final diagnosis. Abundance analysis. There were many more abundant taxa in the fecal than in the mucosal samples of children with CeD. Table 2 highlights significantly more abundant (log2 fold change > 0) and less abundant (log2 fold change < 0) taxa in fecal with corresponding abundance in mucosal samples. The LDA score, depicted in Fig 1, illustrates significant difference in abundance of taxa in stool samples between CeD and non-CeD controls. Significantly more abundant species in children with CeD included Human polyomavirus 2, Enterobacteria phage mEpX1, and Enterobacteria phage mEpX2; whereas more abundant species in non-CeD children (less abundant in children with CeD) included Lactococcus phages ul36, Lactococcus phage LF1, and Streptococcus phage Abc2. In mucosal samples, no taxa were significantly associated with CeD.
Breadth and depth analysis. The results are presented in an excel file with three tabs: 1."summarised_data" includes 156 rows, for each of the 39 genomes by each of the four cohorts (mucosa celiac, mucosa non-celiac, stool celiac, stool non-celiac). For each, the metadata and reference information have been provided, along with an average of the coverage depth average, and an average of the coverage breadth, 2."raw_data" includes the mapping information for each sample by each genome. This allows a deeper look at which sample contribute to the values, 3. "pivot_table" was used to aggregate the "raw_data" column into the "summarized_data" tab before the additional metadata columns were added to it (S1 Table).
In addition, a multifasta file shows the sequence of each of the 39 viral species identified with significant differences between CeD and controls (S2 Table).

Diversity analysis
Alpha diversity. The outcomes of Shannon alpha diversity analysis are presented in Fig 2, highlighting no significant difference existed in diversity between CeD and non-CeD mucosal (P = 0.65) or stool (P = 0.42) samples. However, there was a higher diversity in stool than mucosal samples in CeD samples as well as controls.
Beta diversity. Bray-Curtis diversity analysis results are illustrated in Figs 3 and 4, indicating almost no separation existed between CeD and non-CeD mucosal or stool samples.

Discussion
The role of viruses in health and disease is well-recognized. Despite the known ability of bacteriophages to infect bacteria and alter the microbiota structure, leading to microbial dysbiosis, studies on intestinal virome are still fragmented, potentially missing an important component of biological networks [36,37].
In this report, using whole genome analysis, we describe significant associations between viruses and CeD in a cohort of children in the KSA, a country with a high prevalence of CeD (1.5%) [38] as well as a high rate of CeD-predisposing HLA-DQ genotypes (52.7%) [39]. The fact that all children in this study were newly diagnosed and their samples were collected on normal diet containing gluten suggests strong association between the viruses observed and CeD.
To our knowledge, this is the first whole genome description of the virome profile in children with CeD from developing countries who have different culture and lifestyle from Western populations. The finding of significantly more abundant viruses in samples of children with CeD than in controls such as Human polyomavirus 2, Enterobacteria phage mEpX1, and Enterobacteria phage mEpX2 suggests a potential "harmful" role. On the other hand, the identification of significantly less abundant viruses such as Lactococcus phage ul36 suggests a potential "protective" role. Of particular interest is the finding of significantly more abundant Human polyomavirus 2 (commonly referred to as the JC virus or John Cunningham virus) in   that were not found in our children with CeD [42]. This discrepancy may be explained by differences in the age of children, the methodology, and geographic and population differences between the two studies. Taken together, these findings characterize the presence of viral dysbiosis in children with CeD and support previous reports suggesting a potential role of viruses in general and bacteriophages in particular in children with CeD [43]. Viruses can modify the risk of CeD by interaction with bacteria that affect the immune system. For example, in a study in rats, Enterobacteria and Bifidobacteria affected the permeability of gliadin-induced intestinal mucosa [44], and in humans, it has been shown that Bacteroides fragilis, E. coli, and Shigella could be risk factors that regulate the ability of monocyte recruitment to the mucosa to respond to gliadins and IFN-gamma in CD patients, influencing the course of the disease [45]. Several other possible mechanisms, including the role of some bacterial virulence factors such as microbial transglutaminase as a risk for CeD development, have been extensively reviewed [43]. However, as is true in most microbiome studies, the highly significant association of intestinal virome with CeD in this report does not imply causality.

Conclusion
The metagenomic analysis of the intestinal virome revealed statistically significant dysbiosis in children with celiac CeD. Further studies with functional analyses to define the relationship of bacteriophages to bacteria and to clarify the role of viruses in CeD might lead to the development of additional treatment options.
Supporting information S1